Goto

Collaborating Authors

 monte carlo experiment


Gradient Boosting for Spatial Panel Models with Random and Fixed Effects

arXiv.org Machine Learning

Due to the increase in data availability in urban and regional studies, various spatial panel models have emerged to model spatial panel data, which exhibit spatial patterns and spatial dependencies between observations across time. Although estimation is usually based on maximum likelihood or generalized method of moments, these methods may fail to yield unique solutions if researchers are faced with high-dimensional settings. This article proposes a model-based gradient boosting algorithm, which enables estimation with interpretable results that is feasible in low- and high-dimensional settings. Due to its modular nature, the flexible model-based gradient boosting algorithm is suitable for a variety of spatial panel models, which can include random and fixed effects. The general framework also enables data-driven model and variable selection as well as implicit regularization where the bias-variance trade-off is controlled for, thereby enhancing accuracy of prediction on out-of-sample spatial panel data. Monte Carlo experiments concerned with the performance of estimation and variable selection confirm proper functionality in low- and high-dimensional settings while real-world applications including non-life insurance in Italian districts, rice production in Indonesian farms and life expectancy in German districts illustrate the potential application.


Online estimation methods for irregular autoregressive models

arXiv.org Artificial Intelligence

In the last decades, due to the huge technological growth observed, it has become increasingly common that a collection of temporal data rapidly accumulates in vast amounts. This provides an opportunity for extracting valuable information through the estimation of increasingly precise models. But at the same time it imposes the challenge of continuously updating the models as new data become available. Currently available methods for addressing this problem, the so-called online learning methods, use current parameter estimations and novel data to update the estimators. These approaches avoid using the full raw data and speeding up the computations. In this work we consider three online learning algorithms for parameters estimation in the context of time series models. In particular, the methods implemented are: gradient descent, Newton-step and Kalman filter recursions. These algorithms are applied to the recently developed irregularly observed autoregressive (iAR) model. The estimation accuracy of the proposed methods is assessed by means of Monte Carlo experiments. The results obtained show that the proposed online estimation methods allow for a precise estimation of the parameters that generate the data both for the regularly and irregularly observed time series. These online approaches are numerically efficient, allowing substantial computational time savings. Moreover, we show that the proposed methods are able to adapt the parameter estimates quickly when the time series behavior changes, unlike batch estimation methods.


Landcover classification using LiDAR and Hyperspectral data Fusion

#artificialintelligence

Learn to perform robust landcover classification using the fusion of hyperspectral and LiDAR data. This article is Part 3 in the Landcover classification series. In the 1st part, we learned about using a single pixel from LiDAR for landcover classification. In the 2nd part, we learned to use an NxN neighborhood around the pixel from LiDAR for classification. In this article, we will use the fusion of Hyperspectral Imagery (HSI) and LiDAR data to improve the classification performance. Therefore, merging information from multiple sensors will provide insight into the region of interest.


High-dimensional mixed-frequency IV regression

arXiv.org Machine Learning

The technological progress over the past decades has made it possible to generate, to collect, and to store new intraday high-frequency time series datasets that are widely available along with the "old" low-frequency data. Indeed, the economic activity occurs in real time and the economic and financial transactions are frequently recorded instantaneously, while the traditional time series data are available at a quarterly, monthly, or sometimes daily frequencies. Ignoring the high-frequency nature of the data leads to the loss of the information through the temporal aggregation and makes it impossible to quantify the economic activity in real time. At the same time, combining the low and the high-frequency datasets allows obtaining more refined measures of the economic activity that can be used subsequently to inform market participants and to guide policies. In this paper, we introduce a novel high-dimensional mixed-frequency instrumental variable (IV) regression suitable for the datasets recorded at different frequencies. The model connects a low-frequency dependent variable to endogenous covariates sampled from a continuous-time stochastic process. Alternatively, the regressor might be sampled from a continuous-space stochastic process encountered in the spatial data analysis or any other stochastic process indexed by the continuum. This leads to the high-dimensional IV regression with a large number of endogenous regressors.


Efficient Neutrino Oscillation Parameter Inference with Gaussian Process

arXiv.org Machine Learning

Neutrino oscillation study involves inferences from tiny samples of data which have complicated dependencies on multiple oscillation parameters simultaneously. This is typically carried out using the unified approach of Feldman and Cousins which is very computationally expensive, on the order of tens of millions of CPU hours. In this work, we propose an iterative method using Gaussian Process to efficiently find a confidence contour for the oscillation parameters and show that it produces the same results at a fraction of the computation cost.


Orthogonal Random Forest for Heterogeneous Treatment Effect Estimation

arXiv.org Machine Learning

We study the problem of estimating heterogeneous treatment effects from observational data, where the treatment policy on the collected data was determined by potentially many confounding observable variables. We propose orthogonal random forest, an algorithm that combines orthogonalization, a technique that effectively removes the confounding effect in two-stage estimation, with generalized random forests [Athey et al., 2017], a flexible method for estimating treatment effect heterogeneity. We prove a consistency rate result of our estimator in the partially linear regression model, and en route we provide a consistency analysis for a general framework of performing generalized method of moments (GMM) estimation. We also provide a comprehensive empirical evaluation of our algorithms, and show that they consistently outperform baseline approaches.